A Discourse Search Engine Based on Rhetorical Structure Theory
نویسندگان
چکیده
Representing a document as a bag-of-words and using keywords to retrieve relevant documents have seen a great success in large scale information retrieval systems such as Web search engines. Bag-of-words representation is computationally efficient and with proper term weighting and document ranking methods can perform surprisingly well for a simple document representation method. However, such a representation ignores the rich discourse structure in a document, which could provide useful clues when determining the relevancy of a document to a given user query. We develop the first-ever Discourse Search Engine (DSE) that exploits the discourse structure in documents to overcome the limitations associated with the bag-of-words document representations in information retrieval. We use Rhetorical Structure Theory (RST) to represent a document as a discourse tree connecting numerous elementary discourse units (EDUs) via discourse relations. Given a query, our discourse search engine can retrieve not only relevant documents to the query, but also individual statements from those relevant documents that describe some discourse relations to the query. We propose several ranking scores that consider the discourse structure in the documents to measure the relevance of a pair of EDUs to a query. Moreover, we combine those individual relevance scores using a random decision forest (RDF) model to create a single relevance score. Despite the numerous challenges of constructing a rich document representation using the discourse relations in a document, our experimental results show that it improves the F-score in an information retrieval task. We publicly release our manually annotated test collection to expedite future research in discourse-based information retrieval.
منابع مشابه
The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملThe RST Basque TreeBank: an online search interface to check rhetorical relations
This paper introduces the first Basque discourse TreeBank annotated with rhetorical relations following Rhetorical Structure Theory. We report the main features of the corpus, such as the annotation criteria, inter-annotator agreement and harmonization procedure. We describe an online search system to check the annotation of discourse relations.
متن کاملAutomatic discourse structure generation using rhetorical structure theory
This tiiesis addresses a difficult problem in text processing: crealing a System lo automalically dérive rhetorical structures o f text. Allhough thè rhelorical structure lias proven to be useful in many fields o f text processing sucli as text summarisation and information extraction, Systems that auiomalically generate rhetorical structures with high accuracy are difficult to find. This is bc...
متن کاملHierarchical Discourse Parsing Based on Similarity Metrics
Attentional State Theory and Rhetorical Structure Theory are two predominant theories of discourse parsing. Combining these two approaches, in this paper, we describe a novel approach for discourse parsing. The resulting discourse tree structure retains following properties: structure of purpose from Attentional State Theory and relations between sentences from Rhetorical Structure Theory. We d...
متن کاملRhetorical Structure Analysis of EFLs’ Written Narratives of a Picture Story
This study was set to reveal how second language learners use rhetorical relations in their written narratives in terms of Rhetorical Structure Theory (RST) primarily proposed by Mann & Thompson (1987) and developed by Mann, Matthiessen & Thompson (1992). To this end, sixty written narratives based on the picture story book ‘Frog, where are you?’ were collected from EFL learners and were put to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015